Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

نویسندگان

Dipankar Das

Sasikanth Avancha

Dheevatsa Mudigere

Karthikeyan Vaidyanathan

Srinivas Sridharan

Dhiraj D. Kalamkar

Bharat Kaul

Pradeep Dubey

چکیده

We design and implement a distributed multinode synchronous SGD algorithm, without altering hyperparameters, or compressing data, or altering algorithmic behavior. We perform a detailed analysis of scaling, and identify optimal design points for different networks. We demonstrate scaling of CNNs on 100s of nodes, and present what we believe to be record training throughputs. A 512 minibatch VGG-A CNN training run is scaled 90X on 128 nodes. Also 256 minibatch VGG-A and OverFeat-FAST networks are scaled 53X and 42X respectively on a 64 node cluster. We also demonstrate the generality of our approach via best-in-class 6.5X scaling for a 7-layer DNN on 16 nodes. Thereafter we attempt to democratize deep-learning by training on an Ethernet based AWS cluster and show 14X scaling on 16 nodes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Probabilistic Synchronous Parallel

Most machine learning and deep neural network algorithms rely on certain iterative algorithms to optimise their utility/cost functions, e.g. Stochastic Gradient Descent (SGD). In distributed learning, the networked nodes have to work collaboratively to update the model parameters, and the way how they proceed is referred to as synchronous parallel design (or barrier control). Synchronous parall...

متن کامل

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

DeepNeural Networks (DNNs) are becoming an important tool inmodern computing applications. Accelerating their training is a major challenge and techniques range from distributed algorithms to low-level circuit design. In this survey, we describe the problem from a theoretical perspective, followed by approaches for its parallelization. Specifically, we present trends in DNN architectures and th...

متن کامل

How to scale distributed deep learning?

Training time on large datasets for deep neural networks is the principal workflow bottleneck in a number of important applications of deep learning, such as object classification and detection in automatic driver assistance systems (ADAS). To minimize training time, the training of a deep neural network must be scaled beyond a single machine to as many machines as possible by distributing the ...

متن کامل

High Throughput Synchronous Distributed Stochastic Gradient Descent

We introduce a new, high-throughput, synchronous, distributed, data-parallel, stochasticgradient-descent learning algorithm. This algorithm uses amortized inference in a computecluster-specific, deep, generative, dynamical model to perform joint posterior predictive inference of the mini-batch gradient computation times of all worker-nodes in a parallel computing cluster. We show that a synchro...

متن کامل

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

In this paper, we present a co-designed petascale high-density GPU cluster to expedite distributed deep learning training with synchronous Stochastic Gradient Descent (SSGD). This architecture of our heterogeneous cluster is inspired by Harvard architecture. Regarding to different roles in the system, nodes are configured as different specifications. Based on the topology of the whole system’s ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

CoRR

دوره abs/1602.06709 شماره

صفحات -

تاریخ انتشار 2016

Distributed Deep Learning Using Synchronous Stochastic Gradient Descent

نویسندگان

چکیده

منابع مشابه

Probabilistic Synchronous Parallel

Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis

How to scale distributed deep learning?

High Throughput Synchronous Distributed Stochastic Gradient Descent

MiMatrix: A Massively Distributed Deep Learning Framework on a Petascale High-density Heterogeneous Cluster

عنوان ژورنال:

اشتراک گذاری